Normalized Distance Matrix Method for Construction of Phylogenetic Trees Using New Compressor - Dnabit Compress

نویسنده

  • ALLAM APPARAO
چکیده

We define a compression distance, based on a normal compressor to show it is an admissible distance. The first theme concerns the statistical significance of compressed file sizes. Only in recent years have scientists begun to appreciate the fact that compression ratios signify a great deal of important statistical information. In applying the approach, we have used a new DNA sequence compressor “DNABIT compress” C. A compressor C approximates the information distance E(x,y) based on Kolmogorov complexity, by the compression distance EC(x,y). Compression algorithms can be used to approximate the Kolmogorov complexity. The normalized compression distance, an efficiently computable, and thus practically applicable form of the normalized information distance is used to calculate Distance Matrix .In this paper this new distance matrix is proposed to reconstruct Phylogenetic tree. Phylogenies are the main tool for representing the relationship among biological entities. Phylogenetic reconstruction methods attempt to find the evolutionary history of given set of species. This history is usually described by an edge weighted tree, where edges correspond to different branches of evolution, and the weight of an edge corresponds to the amount of evolutionary change on that particular branch. We constructed a phylogenetic tree with BChE DNA sequences of mammals giving new proposed distance matrix by DNABIT compressor to NJ (Neighbor-Joining algorithm) tree.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel k-word relative measure for sequence comparison

In order to extract phylogenetic information from DNA sequences, the new normalized k-word average relative distance is proposed in this paper. The proposed measure was tested by discriminate analysis and phylogenetic analysis. The phylogenetic trees based on the Manhattan distance measure are reconstructed with k ranging from 1 to 12. At the same time, a new method is suggested to reduce the m...

متن کامل

DNABIT Compress – Genome compression algorithm

Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits...

متن کامل

Phylogenetic Tree Construction for Y-DNA Haplogroups

Male Y-chromosome is currently used to estimate the paternal ancestry and migratory patterns of humans. Y-chromosomal Short Tandem Repeat(STR) segments provide important data for reconstructing phylogenetic trees. However, STR data is not widely used for phylogeny because there is not enough appropriate methodology. We propose a three-step method for analyzing large numbers of STR data and cons...

متن کامل

Image Analysis in a Parameter-Free Setting

The paper proposes a new method to approximate the normalized information distance by a compression method that is particularly suited for image data. The new method is based on a video compressor. The new method is used to compute the distance matrix of all the images in the data sets considered. Moreover, the hierarchical clustering method from the R package is used to cluster the distance ma...

متن کامل

A Novel Genetic Algorithm based Approach for Optimization of Distance Matrix for Phylogenetic Tree ConstructionA Novel Genetic Algorithm based Approach for Optimization of Distance Matrix for Phylogenetic Tree Construction

Phylogenies are useful for organizing knowledge of biological diversity, for structuring classifications, and for providing knowledge of events that occurred during evolution. Different phylogenetic reconstruction techniques are available. In this paper Distance based technique is used. Distance measure is an important issue in phylogenetic analysis. Traditional approaches are time-consuming du...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011